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Abstract 

Recently, span programs have been shown to be equivalent to quantum query algorithms. 
It is an open problem whether this equivalence can be utilized in order to come up with new 
quantum algorithms. We address this problem by providing span programs for some linear 
algebra problems. 

We develop a notion of a high level span program, that abstracts from loading input vectors 
into a span program. Then we give a high level span program for the rank problem. The last 
section of the paper deals with reducing a high level span program to an ordinary span program 
that can be solved using known quantum query algorithms. 

1 Introduction 

Span programs, introduced by Karchmer and Wigderson in [15) . is a certain way of defining Boolean 
functions. Initially, they were used over finite fields and applications included, in particular, a log- 
space analogue of the complexity class inclusion NP C ©P, and secret sharing schemes. 

The realization of their connection to quantum computation is due to the research in quantum 
algorithms for formulae evaluation. This trend of research was initiated by papers [H] by Farhi 
et al. and [3] by Ambainis et al. on computing AND-OR trees. Reichardt and Spalek applied 
span programs over complex numbers while extending the set of allowed gates to all three- variable 
Boolean functions [35]. The key feature of span programs, that is especially useful for computing 
formulae, is the ease with that they compose. 

Even more, later it has been shown that span programs, measured by a newly defined complexity 
measure - witness size, and quantum query algorithms, measured by the number of queries, are 
essentially equivalent! This was done by Reichardt. At first, up to a logarithmic factor in pU] . 
The latter was successfully removed in |21j . The lower bound was proven by showing that the 
generalized adversary bound [M] by H0yer et al. is dual to the witness size of a span program in 
the sense of semi-definite programming. With the quantum algorithm for evaluating span programs 
in hand, this proves the equivalence. 

Although any Boolean function can be optimally (in the number of queries) evaluated by a 
span program, it is still an open problem to come up with a good quantum algorithm based on 
span program evaluation. Until now, the only examples included formula evaluation, like in the 
already cited |22j or in a more recent [24]. However, a significant difference of span programs 
from other models of quantum computation gives a hope they could provide some insights into the 
construction of efficient quantum algorithms. 

In this paper, we embark on extending the family of algorithms based on span programs. It 
seems natural to start with linear algebra problems, since they are most in the spirit of span 
programs. We strike a first step in this direction by designing, firstly, an algorithm that is almost 
a restatement of a span program: decide whether a fixed vector is in the span of given vectors. 
Secondly, we use this algorithm to solve the rank problem. 
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In the rank problem, we are given an 71 x m-matrix A and an integer < r < n. The task is to 
detect whether rank^ > r. The most important special case is r = to = n. The latter is known 
as the determinant problem and consists in distinguishing a singular matrix from a non-singular 
one. Of course, there is nothing special about the zero eigenvalue: if A is a matrix, one can detect 
whether it has eigenvalue A by considering A — XI as an input to the determinant problem. This is 
a common building block for many quantum algorithms. For instance, the quantum walk approach 
by Szegedy [23j consists in deciding whether a specific unitary transformation has eigenvalue 1. 
Also, the optimal quantum algorithm for span programs themselves tests a matrix for having 
eigenvalue [3T]. 

Dorn and Thierauf show an fl{n?) lower bound on quantum query complexity of the determinant 
(and, hence, the rank) problem [9]. This could be expected, as, in general, a very small change is 
required for a singular matrix to become non-singular. We, however, consider a promise problem, 
with a gap between allowed matrices of ranks < r and > r. 

Let Cr{A) be the quadratic mean of the reciprocals of the r largest singular values of A: 

This value is defined for all matrices of rank at least r, and is infinitely large for all other matrices. 
One of the main contributions of the paper is the following 

Algorithm 1. The rank problem can be solved in 0{\/ r{n — r + 1)LT) quantum queries with the 
promise that any input matrix A has all entries bounded, by absolute value, by 1, and any input 
matrix, with rank at least r, satisfies Cr{A) < L. Here T is the cost of loading an n x m-matrix 
into a span program. 

Loading a matrix into a span program is a subroutine we deal with in Section [S] It is similar 
to Hamiltonian simulation in ordinary quantum algorithms. 

In particular. Algorithm [1] can be used to solve the determinant problem for n x n matrices 
in 0{^/nCn{A)T) queries, where A is the worst matrix among all allowed non-singular matrices. 
Value c„{A), we denote also by c{A), admits an alternative description as 

ciA) = \\A-'\\E/V^, (1) 

where || • \\e is the Euclidean norm. 

The determinant problem has close affinity to the quantum phase estimation |16j . Using the 
latter it is possible to give a quantum algorithm solving the determinant problem for an n x n 
Hermitian matrix H in 0{y/ri/ Xmin{H)) applications of e'^. It is worse than ([T]), if H has a broad 
spectrum. Wc elaborate more on this subject in Section [3] It is also possible to get a bound 
similar to ([l} by applying a novel variable-time quantum amplitude amplification algorithm by 
Ambainis [T] , but our span program is more straightforward. 

The paper is organized as follows. In Section [21 we give some basic notions from linear algebra 
and describe span programs and their complexity measure - witness size. In Section 13.11 some 
basic concepts of quantum computation are given. The content of this section is brought into 
for the purpose of comparison only, and is not used in the main part of the paper. We also 
describe how a typical quantum algorithm for the determinant problem would look like, if it had 
appeared in a graduate textbook on quantum computation. We compare it to our span-program- 
based algorithm in Section 13.21 We conclude that, under a reasonable probability distribution 
on matrices, our algorithm is asymptotically faster, almost surely. Another contribution of this 
section is a result on the distribution of the trace of an inverse Wishart matrix. The result is of 
independent interest and is used in the construction of a span program for the rank problem in 
Section 1121 
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After that, we proceed with high level span programs in Section 2] These are the same span 
programs as in Section 12.21 but with the assumption we can query input vectors directly. In 
Section 231 we define them, and in Section describe a high level span program for the rank 
problem. 

Unfortunately, the novelty of span programs has its downside. We cannot use tools developed in 
quantum computation, hence, we define some low level machinery for span programs in Section [51 
and this takes a large part of the paper. 

2 Preliminaries 

2.1 Linear Algebra 

For the basic notions of linear algebra reader may refer to [13] . We work with real vector spaces 
mostly, with an exception for Section [3.11 only. We denote the inner product by {x, y). By ||a;||, we 
denote the 2-norm of x. 

If A is an n X m-matrix, we denote by its spectral norm max^^ ||y4a;||/||x||. By we 
denote its Euclidean (also known as Frobenius) norm 

Since AA^ is a positive definite matrix, it has only non-zero eigenvalues. Square roots of 
the eigenvalues are known as singular values of A. Any n x m matrix admits singular value 
decomposition A ~ UT,V^ with U being an n x n orthogonal (i.e., unitary and real) matrix, 
V being an m x to orthogonal matrix, and S being n x m matrix with singular values on the 
"diagonal" and all other elements zeroes. The columns of U are called the left singular vectors of 
A, and the columns of V are the right singular vectors of A. 

The matrix norms defined can be easily described using singular values. Let cti, . . . , (T„ be the 
singular values of A. The spectral norm equals a^naxiA), i.e., the maximal singular value of A, 
and the Euclidean norm equals y^af + ■ ■ ■ + a'^. These equalities follow directly from the spectral 
decomposition and the invariance of 2-norm under orthogonal transformations. 

2.2 Span Programs 

In this section, we define span programs following, mostly, |20[ . A span program is a way of 
computing a Boolean function {0, 1}™ {0, 1}. It is defined by 

• A finite-dimensional inner product space V — M". Reichardt et al. define span programs 
over C, we find real span programs more convenient. Real span programs are known to be 
equivalent to the complex ones [201 Lemma 4.11]; 

• A non-zero target vector t £ V] 

• A set of input vectors I d V. The set / is split into the union of the set of free input vectors 
/free and the collection of sets {Ij^b} with j = 1, . . . ,m and b = 0,1: I = /ftoo U IJ^ j, Ij^b- The 
input vectors of /j.f, are labeled by the tuple of the j-th input variable Xj and its possible 
value b. 

For each input x = (xj) £ {0, 1}™, define the set of available input vector as I{x) = /ftcc U 
[J^^i Ij,xj- Its complement I\I{x) is called the set oi false input vectors. We say that V evaluates 
to 1 on input x, iSt € span(/(a;)). In this way, span programs define total Boolean functions. One 
can define a span programs for a partial Boolean function as well, by ignoring the output of the 
program on the complement of the domain. 
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A useful notion of complexity for a span program is that of witness size. Assume, up to the 
end of the section, a span program V calculates a partial Boolean function f : V ^ {0, 1} with 
C {0, 1}™. Let A and A{x) be matrices having / and I{x) as their columns, respectively. 

If V evaluates to 1 on input x £ T), a, witness for this input is any vector w G RI^^^^I such that 
^(a;)^ = t. The size of w is defined as its norm squared Hwlp. 

If, on contrary, f{x) = then a witness for this input is any vector w' £ V such that {w' , t) — I 
and that is orthogonal to all vectors from I{x). Since t ^ span(J(a;)), such a vector exists. The 
size of w' is defined as || Note that this equals the sum of squares of inner products of w' 

with all false input vectors. 

The witness size wsize(P, x) of span program V on input x is defined as the minimal size among 
all witnesses for x in "P. We also use notation 

wsizeiiCP — max wsizeCP , x) . 

xeV:f{x)=b 

The witness size of V is defined as 

wsize(7',2?) = ^ wsizco (VjV) wsizci {V.V). 

This is not a standard definition, but it appears as equation (2.8) in |20| . 

The following important theorem is a combination of results from |21j and |20j and it shows 
why span programs are important for quantum computation: 

Theorem 2. For any partial Boolean function f: {0, 1}" D 2? — > {0, 1} and for any span pro- 
gram V computing f , there exists a 2- sided bounded error quantum algorithm calculating f in 
0(wsize{V jV)) queries. 

Thus, a search for a good quantum query algorithm is essentially equivalent to a search for a 
span program with small witness size. 

3 Previous Results 

The main point of this section is to interpret Algorithm [1] in the context of known quantum 
algorithms. In order to simplify the comparison, we limit ourselves to the determinant problem, 
as the most important special case. 

In Section [3. 11 we give a short exposition of quantum algorithms we find relevant to the deter- 
minant problem. The content of this section is not used in the proofs of the main results. However, 
we utilize some results to prove some lower bounds in Section 15.41 

Also, as a quantum algorithm for the determinant problem doesn't seem to appear in an explicit 
form in the literature, we give our variant. Algorithm 31 based on standard quantum subroutines. 
In Section 13.21 we compare the performance of Algorithms [T] and El In order to do this, we use 
Gaussian matrices, and prove a result on the distribution of the trace of an inverse Wishart matrix. 
The content of this section is used later in the proof of Algorithm [T] in Section 14.21 

3.1 Quantum Query Algorithms 

For the basic concepts of quantum algorithms, a reader may refer to |19| . Query algorithms measure 
the complexity of a problem by the number of queries to the input the best algorithm should make. 
Clearly, it provides a lower bound on the time complexity. For many algorithms, query complexity 
can be analyzed easier than time complexity. For the definition of query complexity and its basic 
properties, a good reference is [6]. 
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One of the basic quantum algorithms is Grover search [TT]. It is capable of computing the 
OR function in 0{^/n) quantum queries. Its optimality was one of the first quantum lower bound 
results. The Grover algorithm is optimal even for the unique search problem, when it is promised 
the input string contains no more than 1 element set to '1'. 

Theorem 3 ([5]). Any quantum algorithm discriminating zero string from any string, containing 
exactly one '1', requires ^l{-^/n) quantum queries. 

One extension of Grover search is quantum amplitude amplification [3] . Assume ^ is a quantum 
algorithm without measurements and let one of its registers, x, represent the "goodness" of the 
output: the output is considered good iff x = 1. Assume a is the probability of obtaining a good 
output when the final output of A is measured in the standard basis. Then, quantum amplitude 
amplification allows one to boost this probability up to in 0{ll ^/a) applications of A. 

Two quantum subroutines are important for us. The first one is Hamiltonian simulation. The 
problem is to (approximately) implement the unitary e*^ where iJ is a Hermitian matrix encoded 
with input variables. Recently, Childs and Kothari came up with an improved algorithm [7] for 
the Hamiltonian simulation problem running in time [d + log* N)d'^\\H\\{d\\H\\/ S)"^^"^ where d is 
the maximal number of non-zero entries in any row of H and 5 is precision of the algorithm. The 
algorithm, additionally, assumes matrix H is efficiently row-computable, i.e., for any row index i, 
one can get a d-list containing indices of all non-zero elements in the i-ih row of H . 

Another important subroutine is quantum phase estimation |16) . Given a unitary U and its 
eigenvector -0, the algorithms produces such that e*"^ is the eigenvalue corresponding to ?/;■ If e is 
error probability of the algorithm and 5 is precision (i.e., an answer within ±5 interval around the 
true value is considered as correct), the algorithm can be implemented using 0( j- log j) controlled 
applications of U [18^. The algorithm can be extended to all V''s, that are not necessary eigenvectors 
of J7, by linearity. 

Since it has not been done explicitly before, we compose these quantum subroutines into an 
algorithm for solving the determinant problem. Recall, in the determinant problem, we are given 
an 71 X 71 matrix A and we should decide whether it has full rank. In Section [3.21 we compare this 
algorithm to our span-program-based Algorithm [TJ 

Algorithm 4. The determinant problem can be solved in 0{y/nLT) quantum queries with the 
promise that any input matrix A satisfies \\A\\ < I and any non-singular input matrix A satisfies 
CTmin(^) > 1/-^- Here, T is the number of queries needed to simulate A (see further in the proof). 

Proof sketch. Let {ai} be the singular values of A. Since A is not Hermitian, it cannot be used 
directly in the Hamiltonian simulation subroutine. The solution is to replace A by 



where A* is the transpose complex conjugate of A. The matrix H is Hermitian and has {icr^} 
as eigenvalues. Parameter T in the statement of the algorithm describes the cost of simulating 
H. Note also, that if one wishes to apply sparse Hamiltonian simulation, it is enough for H to 
be efficiently row-computable, but this means that A should be both efficiently row- and column- 
computable. 

By the norm bound on A, the unitary e*^ has eigenvalue 1, iff A is singular. The next step is 
to apply phase estimation on e'^ with a random vector a.s ip. In order to distinguish between zero 
and non-zero eigenvalues of H, the precision of the phase estimation should be at least 1/L. 

Say the output of the phase estimation is good, if the phase ip — 0. li A has rank ti — 1, the 
probability of obtaining a good output when measuring the final state is 1/n. If A has full rank, 
it can be made much smaller. 
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Using quantum amplitude amplification, we can boost the probability l/n up to ri(l) in 0{-\/n) 



applications of the phase estimation. This makes the total 0{y/nL) applications of < 



iH 



□ 



3.2 Comparison of Algorithms 

We have seen two algorithms for the determinant problem: Algorithms 2] and [T] The first one 
requires 0{^/n/a„lin) applications of e*^, whereas the second one uses the number of queries equal 
to 0{y/nc{A)) times the complexity of loading A into the span program. We stick to our view that 
loading a matrix into a span program is an operation of the same flavor as simulating a Hamiltonian, 
hence, we cancel out the corresponding factors in both of the complexity estimations. 

Under this assumption, the complexity comparison boils down to comparing l/aminiA) and 
c{A). It is easy to see, the latter is smaller than the former, but is is not clear whether the 
improvement is significant. Of course, it depends on the type of a problem the algorithm is used 
to solve and the matrices appearing therein. If the problem is not set, it is natural to compare 
performance on random instances. In this section, we consider a natural probability distribution 
on matrices and show that, under this distribution, c{A) is asymptotically smaller than 1 / cfminiA) 
with the ratio being Q{y/n). 

A Gaussian matrix G{n, m) is an 7i x to random matrix with entries independently drawn 
from the standard normal distribution A^(0, 1). This is one of the natural probability distributions 
on matrices. We prove that if A is taken from G{n, n) then, almost surely, c{A) is asymptotically 
smaller than l/CTmin(^)- 

Later, we will need 2 well-known facts about normal distributions we state here: 

Fact 5. A linear combination of independent normal distributions ciN{(),ai) + C2-/V(0, cr|) is the 
normal distribution N{0, c\a\ + 0^02)- 

Fact 6. Gaussian matrix distribution G{n, m) is invariant under multiplication by orthogonal 
matrices both from the left and from the right. 

If ^ ^ G{n, to) then the distribution oi W = AA^ is known as the Wishart distribution 
W{n,m). It is a probability distribution on symmetric nx n matrices. Parameter m is known as 
the number of degrees of freedom. The Wishart distribution is a generalization of the chi-square 
distribution, since W{l,m) coincides with Xm- The Wishart distribution is important for us 
because crmin(^)^ — Amin(W^) and c(A)^ = i Tt{W~^). The following is known about the Wishart 
distribution. For Theorems [7] and IHl refer, e.g., to Section 3.2 of [T71 . 

/„ when W W{n, m) . 

00, nAmin(W^) converges in distribution 



Theorem 7. Ifm>n+1 then E[W ^] = — 

Theorem 8 (Edclman [lO]). IfW ^ W{n, n), then asn ^ 
to a random variable with the probability density function 



1 + 

2^/^ 



(2) 



The Cholesky decomposition of W{n, m) is also well-understood: 
Theorem 9 (Bartlett decomposition). A Wishart matrix W{n,m) equals TT^ where 

(^/tl 



T 



t21 

hi 

\tnl 



id.2 



t'n2 







\ 






(3) 



with ti ^ Xm-i+i '^"■'^ ^ij ^ -^(0, 1) independently. 
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In particular, Theorem [5] means that crmin(^) — 0{l/y/n) almost surely, and this solves the 
complexity estimation for the quantum phase estimation algorithm. The following theorem gives 
an estimate on the distribution of Tr(Vl^~^) and shows that c{A), almost surely, remains bounded 
by a constant. Interestingly, we were not able to find a similar statement proven prior to our paper. 

Theorem 10. For any e > 0, there exists 6 such that, for any n, Pr[c{A) > S] < e when 
A G{n, n). 

Proof. Let W — AA^ . One way to approach the theorem would be to get an expectation for the 
trace of and then apply Markov inequality. But unfortunately, we cannot apply Theorem [7] 
directly, and there is a good reason for that: the expectation doesn't exist. It's mostly because 
the expectation of the inverse of ^ doesn't exist. We solve this complication, broadly speaking, 
by using Theorem [5] for the smallest two singular values of A and applying Theorem [7] for the rest 
of the matrix. 

We utilize the Bartlett decomposition. Thus, let T be as in ([3]) with ti ^ Xn-i+i> so that 
W = TT'^. Clearly, A and T have the same singular values, and the same holds for A ^ and T ^ . 
In particular, because of ([T]), we have 

nc(Af = ||T-i|||. (4) 
To estimate the latter, represent T in the block-diagonal form: 



Til 
721 T22 



where Tn and T22 are square matrices of dimensions n — 2 and 2, respectively. Then calculate the 
inverse 



V21 ^22/ V^r22^^21^11^ ^22 / ^ 

Denote Wn = TuT^i and notice it is the Bartlett decomposition of W{n — 2,n). Hence, by 
Theorem [7] and the linearity of expectation, 

E[\\T^,YE]=E[T^Wr,']^n~2. (6) 

A norm of a row of a matrix is bounded by its largest singular value (i.e., its spectral norm). 
Applying this to the last two rows of T^^, one gets 

IIT21III + ||f22||| < 2(a,„ax(T-i))2 = (7) 

Applying Markov inequality to (O , one obtains a constant Si such that 

Pr[||riV|||><5in] <e/2. (8) 

On the other hand, by Theorem [Sj there exists S2 > such that 

Pr[Xmin{W) < 62/n] < e/2. (9) 

Plugging © into (O and combining with ([5]), in the sight of (H)) and ([5]), we get that 

Pr[c{A) >Si+ 2/S2] < e, 

independently on n. □ 
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4 High Level Span Programs 



In this section, we define high level span programs. The idea of this definition is in the assumption 
we can query input vectors in a span program directly. Recall that in an ordinary span program 
(that we also call actual span program, or a low level span program), the set of possible input 
vectors is fixed, and these are the input Boolean variables that specify which of them become 
available for the span program. For simplicity, we consider real span programs only, complex span 
programs can be simulated by the real ones using standard techniques. 

Of course, querying vectors is not a commonly admitted operation. In Section [51 we give a 
reduction of a high level span program to an ordinary span program. We call this operation matrix 
loading. 

4.1 Definition 

The specification of a high level span program V consists of 

• two positive integers n and m. The first one specifies the dimension of the vector space, 
and the second one specifies the number of input vectors. We represent the input vector as 
columns of an n x m-matrix A. The j-th input vector is denoted by a^; 

• a fixed non-zero target vector t e M"; 

• a set of valid input matrices V; 

• a subspace of free input vectors F. It is an optional ingredient of a span program, but it is 
often easier to describe a span program using free input vectors. Usually, F is given as a 
span of a finite set of free input vectors. 

The span program distinguishes whether affine subspace t + F intersects the linear span of 
columns of A (we denote the latter subspace by span(yl)), or not. Clearly, it is a hard problem to 
decide whether a vector is inside a subspace, or the subspace is perturbed slightly to not contain it. 
Because of this problem, and inspired by the definitions from Section [2. 21 we define the complexity 
measure of a high level span program called witness size. 

If ^ S I? is such that t + F intersects span^, we define its witness as any vector w G M™ such 
that Aw E t + F. If, on contrary, A is such that t + F and span A do not intersect, we define its 
witness as a vector w' € M" with the property {w',t) = 1, w' J- span A and w' _L F. We call the 
former case a positive one, and the latter case a negative one. 

The witness size of a valid input A is defined as 

wsize(P, A) — minllluilp | w is a witness for A in program P}. 
The positive- and negative-input witness sizes are defined as 

wsizei(7') = max wsize(7^,A) and wsizeo(7^) = max wsize(7^,A). 

(t+-F)n spanyl^0 (t+_F)n span A=0 

And finally, the witness size of V is defined as the geometric mean of its positive- and negative-input 
witness sizes: 

wsize(P) — Y/wsizeo(7^) wsizei(7'). 

It is easy to see that the result of the program (but not the witness size!) remains unchanged 
if we rescale the input vectors. So, for convenience, we will latter assume that all entries of the 
input matrix are from interval [—1, 1]. 

The following theorem is a restatement of Subroutines [T^l [H] and [ini 
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Theorem 11. Any high level span program V can be solved by a quantum query algorithm of 
complexity 0(wsize(P)L) where L is complexity of loading matrix A into the span program. 



4.2 Span Program for Rank Problem 

In this section, we give a high level span program for the rank problem, Algorithm [TJ Recall that 
in the rank problem, we are given an n x m-matrix A and an integer < r < n. The task is to 
detect whether rank A > r. In general, one may assume n < m, because, otherwise, the matrix can 
be transposed reducing the complexity of the algorithm. Also, recall the definition of complexity 
measure Cr{A) from the introduction: 




where ai > a2 > ■ ■ ■ > o^r > are the r largest singular values of A. We repeat the statement of 
the algorithm here: 



Algorithm 1. The rank problem can be solved in 0{^J r{n — r + 1)LT) quantum queries with the 
promise that any input matrix A has all entries bounded, by absolute value, by 1, and any input 
matrix, with rank at least r, satisfies Cr{A) < L. Here T is the cost of loading an n x m-matrix 
into a span program. 

Proof. Denote s = n — r. Given vectors ai, . . . , a,„, we add s free input vectors vi,V2, . ■ . ,Vs and 
check if a random vector t is contained in the span of {ai, . . . , Um, vi, . . . , Vs}. The idea is that if 
the rank of A is less than r, adding s vectors to it won't make it to have full rank. On contrary, 
if rank A > r, vectors ai, . . . , a„i, vi, . . . ,Vs span the whole space with probability 1. We generate 
vi, . . . ,Vs and t = (ti) by letting their entries be independent standard Gaussians. 

At first, we consider the case rank A > r and estimate the witness size. For simplicity, we 
assume the rank of A is exactly r. Otherwise, the witness size can only decrease. Let cti, . . . , 
be the largest singular values of A. Let V = (vij) be the matrix with Vj^s as columns. Thus, 

V ~ G(n, s). Because of Fact El we may assume the elements of the standard basis ei, . . . , are 
equal to the left singular vectors of A. Hence, A — 'SaOa where S^i is n x m matrix with cti , . . . , 
on the "diagonal" (and all other elements zeroes) and Oa is an m x m orthogonal matrix. Let V 
be the s x s matrix formed by the last s rows of V. Similarly, we denote by t the last s elements 
oft 

We proceed as follows. At first, we find a; £ such that Vx = t. Then t' — t — Vx is in the 
span of A. Thus, we search for w such that Aw = t' . The witness size of A is at most 
According to Theorem llOl we choose 6, independently on n and s, such that 

Pr[c{V) <S]> 11/12, (10) 

and condition that this is the case, i.e., c{V) is bounded by S. Fix V and denote its singular values 
by CTr+i, . . . , CT„. Again, we may assume e^+i, ■ . • , e„ are the left singular vectors of V. Hence 

V = 'SyOv with Ey — diag{CTr+i, . . . ,cr„} and Oy orthogonal. Under the assumption on c{V), 
vector X, with the property Vx — t, satisfies 



_i—r-\-l 

since, for the standard Gaussian i^, one has E[tf\ = 1. 



sc{Vf^O{s), 
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Now fix a; = (xj). The first r coordinates of t and Vj's are independent on x. Hence, eacti 
of tlic first r coordinates of = {t'^) = t — Vx, as a linear combination of independent normal 
distributions, by Fact[SJ has distribution iV(0, 1 + In particular, unfixing x, we have 

E[t:^]^i+E[\\xr]^oii+s). 

And, finally, for w, satisfying Aw = t' , by the linearity of expectation, we have: 

E [M?] = E ^2^^'] = O ((1 + s)r cM?) . 

i=l 

By Markov inequality, Pr > 12£'[||w|p]] < 1/12. Combining this with (dU]) and the 

assumption on Cr{A), we have that IjuiH^ — 0{{n — r + l)rL^) with probability 5/6. 

Now assume A has rank less than r. In this case, A, together with w^'s, does not span the whole 
space. We can assume ei is orthogonal to the span of {ai, . . . , a™, wi, . . . , Wj}. Then, the witness 
of A, inside the one-dimensional subspace spanned by ei, has size l/t\. It is 0(1) with probability 
5/6. 

In order to solve the rank problem, we execute the span program with the bound 0((n — r + 
l)rL^) on wsizci and 0(1) on wsIzcq. We require error probability of the span program solving 
algorithm be at most 1/6. The total error probability is less than 1/3, and the query complexity, 
by Theorem HH is 0{^J r{n - r + l)LT). □ 

The complexity of Algorithm [T] is optimal, at least for the 0{^r{n — r + 1)) factor. We show 
this using the Hamming-weight threshold function T" : {0, 1}" — > {0, 1} defined in 20 by 

10, otherwise; 

where |a:;| is the Hamming weight of a; = {xi). It is known that any quantum algorithm for T" 
requires Q.{^Jr{ii — ?' + 1)) queries and there exists a (low level) span program for this function 
with witness size at most \/r{n — r + 1) [2D] . 

The threshold function can be reduced to the rank problem by considering the matrix A^ = 
diag(a:). It is clear that rankA^, > r, if and only if T"(x) — 1. For each A^ of rank at least r, 
Cr{Ax) — 1. The complexity of loading A^ into the span program is O(logn), as it is shown in 
Subroutine 1161 further in the text. 

Hence, the r{n — r + 1)) factor in the statement of Algorithm[T]is tight up to a logarithmic 
factor. Also, Algorithm [T] itself can be considered as a generalization of the span program for the 
threshold function. 

5 Matrix Loading Subroutines 

In this section, we develop machinery to deal with high level span programs. We give two sub- 
routines: vector loading and demultiplexor, and then compose them to reduce a high level span 
program to an actual span program. Span program composition has been developed before, but 
one difference of our subroutines is that they result in vectors, not in Boolean variables as it was 
in [20]. 

We give, in total, three different variants of realizing a high level span program by an actual 
one, each with different complexity and different assumptions on matrix A and its accessibility via 
queries. 
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One more convention should be made. In all these subroutines, elements being queried are 
integers and real numbers. Since span programs have been developed for Boolean variables only, 
a representation of numbers using Boolean variables should be chosen. So, from now forth, it is 
always assumed that an integer c in range [0,n — 1] is represented by fc = [logn] Boolean variables 
co,ci, . . . ,Cfc_i, so that 

fe-i 

c = 5Zci2'. (11) 

All real numbers are assumed to be in range [—1,1]. Any such number x is defined using a 
number of Boolean variables xo,xi, . . . ,Xk, where k is some predefined precision parameter, and 
it holds that 

k 

x = ^a;j2-'-l. (12) 

1=0 

5.1 Vector producing subroutines 

Our goal is to start with a high level span program V, and end up with a low level span program 
Plow calculating the same function. Once this is done, one can apply Theorem [2] to the latter 
one. Some input vectors of the high level program can be free. In this case, we add them to the 
set /free of free vectors of Viow If an input vector is actually queried, we have to implement this 
query using the Boolean variables representing the vector. We do this in a number of steps, where 
each step involves composing a vector producing subroutine into a high level span program. Each 
composition transforms a high level span program into a high level span program, refer to Figure [TJ 
In this Figure, £ compositions are performed: each Vi, for z = 0, . . . , ^ — 1, is composed with Si to 
give Vi+i- The initial high level span program Vq contains only free input vectors. The final result 
of the composition is Vi = V. The actual span program "Plow is the union of "Pq and all SiS. 




Figure 1: Composing vector producing subroutines into high level span programs. Bullets represent 
composition operations, is the final high level span program, and boxed are the components of 
the actual span program Plow 

Consider the operation of composing S into Pi in order to get P2. The vector space of Plow 
contains three vector spaces: 

• the vector space V2 of P2 . It contains all input vectors of V2 ', 

• the input space Vi of S. It contains additional dimensions that input vectors of Pi are 
allowed to use, i.e., the composition operation can shrink the vector space of the high level 
span program. Some reference basis of Vi is fixed. When specifying the composition, a 
position of Vi, relatively to V2, gets chosen. The vector space of Pi becomes Vi + V2; 

• the working space V3 of 5. It is orthogonal to Vi + V2. The input vectors of Piow, labeled by 
the input variables used in 5, lie inside (Vi + V2) ® V3. 
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Thus, the vector space of the actual span program Vyow is the direct sum of the vector space 
of Vo and the working spaces of all SiS. All these subspaces are pairwise orthogonal. There are 
some additional components of the subroutine to be specified: 

• a number of pivots, that are vectors identified with elements of V2 during composition. The 
subroutine does not affect the components of input vectors of Vi outside the span of the 
pivots and the input space; 

• input variables. These are Boolean variables labeling input vectors used in S. 

The action of the subroutine depends on whether the final outcome of V is positive or negative. 
If it is positive, input vectors of S are used to transform input vectors of Vi into input vectors of 
7^2 • The resulting input vectors should be contained in V2. Also, the witness w of V2 is transformed 
into a witness w oiVi- 

If the final result is negative, the witness w' of V2 gets extended onto (Vi + V2) ffi V3 so that 
it is still a negative witness, i.e., it is orthogonal to all input vectors of Vi and all available input 
vectors of S. If Vi is orthogonal to V2, it should be always doable (i.e., the subroutine is responsible 
for that). If Vi intersects V2, it should be additionally assured that this extension is possible. 

The total witness size of Piow breaks naturally into individual costs of SiS. That is, the cost 
of iS is the contribution to the witness size from the input vectors of S. It depends on the witness 
of P2. 

For the rest of the section, we use notations w = {wj), w' = {w[), w = (wj) and w' — (w'^) for 
positive and negative witnesses of V2 and "Pi, respectively. The last notation w' will be used also 
for the negative witness of the whole program, i.e., including the working space of S. 

5.2 Dense matrices 

We start with a simple vector loading subroutine: 

Subroutine 12 (Vector Loading). There exists a subroutine producing a vector whose coordinates 
in the pivots of the subroutine are specified by the input real numbers. The specification of the 
subroutine is in Table [H 



Pivots: 


a system of n vectors ei, 62, ... , e„. 


Input variables: 


n real numbers {xi}, i = 1, . . . , n. 


Input space: 


none. 


Result: 


additional input vector Oj = X)"— 1 ^i^ij i^t its index in V2 be j. 


Positive witness: 


the j-th coordinate of w gets removed. 


Positive cost: 


0(nu;2). 


Negative witness: 


does not change. 


Negative cost: 





Table 1: Specification of vector loading subroutine 



Proof. Recall that each Xi is given by the sequence of input Boolean variables {xi^a} that specify 
Xi like in ((T^). Let the basis of the working space be fi^a, with the same range for i and a as in 
{xi.a}. For each input digit Xi^a and each its possible value b € {0, 1}, define input vector 

?;■ 1, — /)2^°/^p- — f 
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Additionally, define a free input vector 

n k n 
i—l a—Q i—1 

For the positive case, we show how to construct the resuhing vector aj from the available input 
vectors. We do this by letting 

the coefficient of Wi,a,a:i,„ be equal to 2^"/^; , ^ 

and the coefficient of v be equal to 1. 

Indeed, this linear combination equals 

n k / n k 71 \ n / ^' \ 

1=1 a=0 a=0 i=l / i=l \a=0 / 

In order to get Wjttj for V2, all coefficients of ([T^ should be multiplied by wj, that gives the cost 
of the subroutine equal to 



EE-f2-"<2 



i=l a=0 



For the negative case, we extend the witness w' to the witness li' of the resulting program by 
letting 

=a;,,„2-"/2^«;',e,). 
It is trivially orthogonal to all Vi^a,xi ^'s- It is also orthogonal to v, since 

Ink n \ n k n n 

('"^EE2~"^v.,a-EeO = EE 2"Xa(w^',e.)-EK'e.) = E^^("^''^') = = 0- 



The inner product of w' with the false input vector, labeled by Xi^a, equals 
(«)', (1 - x,,a)2~'^/2e, - = {w\e,){l - a;,.a)2-"/2 - x.^^^-"^^ {w' , e,) = {l- 2x,,j2-'^/2(«;', e,;). 
And the total contribution to the witness size is no more than 2 ^"^-^ (lu', e^)^. □ 

Applying this procedure m times gives the following theorem that is applicable to any n x m- 
matrix if we assume the matrix is given as a table of nm real numbers. 

Subroutine 13. Any high-level span program V for nxm-matrices can be implemented as an actual 
span program with witness size at most ©(y'nm wsize(7^)), i.e., the complexity of the subroutine is 
0{y^nm). 

Proof. Use m instances {Cj} of the vector loading subroutine with ei, . . . , e„ being the standard 
basis of the vector space of V. Take the input variables for the subroutine Cj from the j-th column 
of A. 

In the positive case, the contribution of Cj is 0{nw'j), where w = (wj) is the witness. Hence, 
the positive witness size of the composed program is 0(n||ii;|p). 

In the negative case, with witness w', the contribution of each subroutine is 0(||w'|p), because 
{ei} is a basis. Hence, the negative witness size is 0(m||w'|p). 

The total witness size can be obtained as the geometrical mean of the maximal positive and 
negative witness sizes. □ 
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5.3 Sparse matrices 

To deal with sparse matrices, we introduce one more subroutine. 

Subroutine 14 (Dcniultiplexor). There exists a subroutine capable of replacing the specified unit 
vector of the input space by one of the pivot, namely, the one whose index is specified by the input 
integer variable. The complete specification of the subroutine is in Table\M 



Pivots: 


vectors eo, ei, . . . , e„_i. 


Input variables: 


an integer c in range [0, n — 1]. 


Input space: 


one-dimensional, spanned by unit vector g. 


Result: 


each input vector dj of Vi gets transformed into input vector 




Oj = Oj + {aj,g){ec - g) of 7^2- 


Positive witness: 


does not change. 


Positive cost: 


o((logn)Er=i^f(a.,5)'). 


Negative witness: 


witness w' gets extended to the witness w' of Vi by letting 




(w',.9) = (■w',ec). 


Negative cost: 


0{{\ogn)Y.U{w'.e,)^). 



Table 2: Specification of demultiplexor 



The logarithmic factors in the positive and negative costs arise from encoding c into binary. 
We believe it is possible to avoid this logarithmic overhead by querying c directly, but we have 
no means to realize such a query in a span program yet. The subroutine can be used also in the 
opposite direction, to replace Cc by g. Functionally, the subroutine is a multiplexor then. We use 
it in this mode in Subroutine 1161 

Proof. For simplicity, assume n — 2^ , otherwise just truncate the resulting subroutine. Recall that 
c is given as in (fTTj) . 

For a — 0, 1, . . . , fc and € = 0, 1, . . . , 2° — 1, define vector f],"'^ as follows. Vector is equal 

to g and ff^^ to ei. For other values of a, vectors f^""^ form an orthonormal basis of the working 
space. 

For each a = 0, . . . , fc — 1, 6 e {0, 1} and £ from {0, 1, . . . , 2'' — 1}, define an input vector 

Va,,, = fi^,^ll - fl"^ (14) 

that is labeled by value b of variable Ca- 
In the positive case, it is easy to see that 

fc-i 

9 — ^ ^ ^a,Ca,c mod 2" • 
0=0 

Then this vector can be used to replace each Oj by a.j + {dj,g){ec ~ g). This gives the specified 
positive complexity. 

In the negative case, let w' be the witness in 1^2 we extend to the witness w' of the composed 
program. All /^^"^'s form a full binary tree in a natural way, with g at the root, e^'s at the leaves 

and with two vertices fj,"^ and f^^^^^ connected, iff there is an input vector of ([T4| being the 
difference of two. We say an edge is available or false if the corresponding input vector is available 
or false, respectively. 
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For each /^^" , there is unique accessible from fj^'^' via available edges. Let {w' , fg) be equal 
to {w',ei). In particular, {w',g) — {w',ec)- It is easy to see that w' is orthogonal to all available 
input vectors of the subroutine. This is because an available edge connects two vertices with the 
same e^'s chosen. 

It remains to estimate the contribution of S to the witness size. Each false edge contributes a 
square of the difference of values of two vertices it connects, i.e., 

for some i and i' . It is easy to see that each vertex is incident to at most 2 false edges and 
each Ci appears in at most /c + 1 vertices. Hence, the total witness size contribution is at most 

(4A; + 4)Er=oX.e.)'- □ 

By adding this subroutine to Subroutine 1131 it is easy to get a variant of the latter for sparse 
input vectors, i.e., matrices with sparse columns. 

Subroutine 15. Any high-level span program V for nxm-matrices can be implemented as an actual 
span program with witness size at most 0{ky/mwsvLe{V) log n) assuming the following requirements 
for matrix A = {uij ) : 

• there are at most k non-zero elements in each column of A; 

• the j-th column of A is represented by integers Cij and reals Xij with i — 1, . . . , fc so that 

d this specifies all non-zero elements of A. 

Proof. Extend the linear space of V with m pairwise orthogonal fc-dimensional subspaces {Uj}. 
Use m vector loading subroutines {Cj} to load, for each fixed j, vectors, given by Xij, into Uj. 
Then use km demultiplexors "Dij to replace the i-th coordinate of Vj by the c^.j-th coordinate of 
V ■ In particular, the pivots of each Vi^j are the elements of the standard basis of the space of V. 

Consider the positive case first. After composing £j, subspace Uj contains a "compressed 
version" of Vj. The witness is the same as in V as demultiplexors don't affect positive witnesses. 
Hence, the cost of Cj is 0{kwlj). The cost of Vi^j is 0{w^x^j\ogn) — 0{wj \ogn), because 
Xij G [—1,1]. Thus, the total positive witness size is 0(fc||u;|p log n). 

In the negative case, the cost of each of Vij is 0(|j log n), again because the pivots form 
a basis. The cost of each Cj equals 0{\\ proj^ where proj^- is the orthogonal projection onto 

the space spanned by with i = 1, . . . ,k. Clearly, it is at most Odlui'lp). Hence, the total 

negative witness size is 0(fcm|| log n). 

Taking the geometric mean, one obtains the total witness size of the actual span program. □ 

A bit more effort is required to get a matrix loading subroutine for sparse matrices: 

Subroutine 16. Any high-level span program V for n x m-matrices can be implemented as an 
actual span program with witness size at most 0{^Jkl(k + wsize(7^) log(m + n)) assuming the 
following requirements for matrix A — [aij ) : 

• there are at most k non-zero elements in each column of A; 

• there are at most i non-zero elements in each row of A; 

• the j-th column of A is represented by integers Ci,j and reals Xij with i — 1, . . . , fc so that 

d this specifies all non-zero elements of A and 

• for each i = 1, . . . , n there is a list of integers di^i, . . . , dij that specifies indices of all non-zero 
elements in the i-th row of A. The list may also contain indices of zero elements. 
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Proof. Similarly to the proof of Subroutine [13 we define m fc-dimensional vector subspaces {Uj}. 
To that, we add m n-dimensional subspaces {Wj}- Let fij and hij be the i-th elements of the 
standard bases of Uj and Wj , respectively. We compose 

• m vector loading subroutines Cj to load vectors specified by x^j 's into T^ 's; 

• km demultiplexors replacing fij by hc-^j; 

• n£ demultiplexors Mij replacing by hi^di j- Note that these latter demultiplexors work in 
the direction opposite to the one of Subroutine 1141 so they are, actually, multiplexors. 

Let us analyse the witness size of this span program. Consider the positive case at first. Various 
subroutines contribute the following costs towards the total witness size. The j-th vector loading 
subroutine Cj contributes 0{kwj). Next, by the similar argument as in the proof of Subroutine [151 
the cost of Vij is 0{Wj logn). After composing all Cj's and T>ij^s, each Wj contains input vector 
Vj. Then, they are moved into the vector space of V, using A^^j 's. One can estimate the cost of 
TWij's by noticing that they collect precisely everything put by I?ij 's into W^ 's. Hence, the total 
cost of all A4i,j's is 0(fc||?i;||^ logm). Summing everything up, we see that the positive witness size 
of the program is 0(fc||z«|p log(TO + n)). 

Let us now consider the negative case. Let, as usually, w' be the witness. At first, we will 
describe how the witness w' of the composed program looks like, and then calculate the witness 
size. All non-zero components of w' are given by: {w' ,hi^di j) = {w',ei) for j — !,...,£ and 
{w',fi,j) = {w',ec,J for i = l,...,k. 

Since for each i and j — !,...,£ the condition on the negative witness in Table [5] for Mi.j is 
fulfilled (both sides of the equality equal {w' , e,)), the witness can be successfully extended to the 
working space of Mij, and its contribution towards the witness size is 



Since there are I multiplexors for each i, the total contribution of all A^ij's is 0(^^||?«'|p logm). 

Similarly, the cost of Vij equals 0{\ogn) times the norm squared of the projection of w' onto 
Wj. Hence, it is not hard to see that the total contribution of all I?ij 's is 0{k£\\'w'\\'^ logn). By the 
same argument, the total contribution of all £j's is 0(^||w'|p). Hence, finally, the total negative 
witness size is 0{£{k + log(m + n)). 

One can obtain the witness size of the whole program by taking the geometric mean. □ 

5.4 Lower bounds 

In this section, we show that 0(-y/ri) and 0{y/rri) factors in Subroutines 1131 and 1151 are optimal. 
We do this by converting a high level span program for some problem into a quantum algorithm, 
using the corresponding matrix loading subroutines. Then we utilize the known lower bound for 
the quantum algorithm. In fact, we will use one quantum lower bound only: Theorem [3l 

The first problem is as follows: we are given m n-tuples consisting of ±l's. Assume n is even, 
and it is promised that each tuple either contains only I's, or contains equal amount of I's and 
— I's. The question is to detect whether there is a tuple containing only I's. This is a combination 
of the Grover and Deutsch-Jozsa [8 problems. 

Clearly, a high level span program with target vector i = (l,l,...,l) solves the problem. The 
positive witness size is at most 1, and the negative witness size is at most 1/n, as w' = t/n is 
a negative witness. Hence, the total witness size is 0{l/-/n). But this, as a search problem. 
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requires n{^/rr^) quantum queries. Hence, the complexity of the loading subroutine should be at 
least r2(-y/nm) that matches the result of Subroutine [T51 

One objection to this bound could be that 0{y/n) factor coincides with the norm of vectors 
being loaded into the span program, and it could be possible that it is needed only for dealing with 
such large vectors. In fact, it is not so, and we show that this factor is vital even if we require the 
vectors have norm 0(1). We show this in a special case of m = 1. 

Assume the vector space of a high level span program is R"+^, and let eg, . . . , e„ be an or- 
thonormal basis. Target vector t equals eg. The search for a '1' in a bit string x = (xi) can be 
implemented by giving the input vector cq + J2i-x -=i *o span program. If all x^'s are zeroes, 
the input vector equals the target vector, and the witness size is 1. Otherwise, if one of the input 
variables, say Xj, equals 1, w' = cq — ej is a negative witness of size 2. Hence, the total witness 
size is 0(1). If we require that at most one input variable can be set to 1, the norm of the input 
vector is bounded by v^, but the complexity of loading the input matrix still has to be il,{y/n), in 
order to match the lower bound for the unique search problem. 

What is still unclear, is whether 0{y/mn) bound of Subroutine [T3] is tight if all columns of 
matrix A have norm 0(1) and m — Also, it is an open problem whether dependency on k 

and £ in Subroutines [TS] and [TC] can be improved. 
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